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Abstract 

A new version of DNA walks, where nucleotides are regarded unequal in 
their contribution to a walk is introduced, which allows us to study thor- 
oughly the "fine structure" of nucleotide sequences. The approach is based 
on the assumption that nucleotides have an inner abstract characteristics, 
the determinative degree, which reflects phenomenological properties of ge- 
netic code and is adjusted to nucleotides physical properties. We consider 
each position in codon independently, which gives three separate walks being 
characterized by different angles and lengths, and such an object is called 
triander which reflects the "strength" of branch. A general method of iden- 
tification of DNA sequence " by triander" , which can be treated as a unique 
"genogram", "gene passport" is proposed. The two- and three-dimensional 
trianders are considered. The difference of sequences fine structure in genes 
and the intergenic space is shown. A clear triplet signal in coding locuses 
is found which is absent in the intergenic space and is independent from 
the sequence length. The topological classification of trianders is presented 
which can allow us to provide a detail working out signatures of functionally 
different genomic regions. 
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1 Introduction 



The genomic DNA sequence analysis using wide range of statistical methods PQI21 
EllllEHnill] and various symmetry investigations IHUHIIIDIIIIIIISIISI is an extremely 
important tool in extracting hidden information about the dynamic process of 
evolution, especially after the availability of fully sequenced genomes |14j . One 
of the most promising approaches is the DNA walks method fHlESlE! (firstly 
introduced by Azbel [IE]) or genomic landscapes [IH], which is based on mapping 
of a sequence into one-, two- or multidimensional metric space according to various 
specific rules. In brief, while drawing a DNA walk, the corresponding mappings 
assign a direction/unit vector to each nucleotide, to dinucleotide or to purine 
(pyrimidine). The resulting broken lines endow a visual presentation to a formal 
sequence of 4 symbols, where inhomogeneous regions, fluctuations, "patches" etc. 
[2Ti] are immediately seen. A modification of the DNA walks method deals with 
each position in codons independently, which gives three separate broken lines 
being characterized by different angles and lengths [2T| . where also addition and 
subtraction of DNA walks were considered {22| . 

Here we introduce a new version of DNA walks, where all 4 nucleotides are 
regarded unequal in the sense that they give contribution to a walk differing not 
only by direction, but also by module. It follows from the assumption [23] that 
nucleotides have an inner abstract characteristics — the determinative degree [21] 
which reflects phenomenological properties of genetic code and is adjusted to nu- 
cleotides physical properties. 

2 Genetic code redundancy, doublet matrix in- 
ner structure and determinative degree 

As is well-known, the genetic code is a highly organized system [22] and has sev- 
eral general properties: triplet character, uniqueness, nonoverlapping, commaless, 
redundancy (degeneracy), which means that most amino acids can be specified by 
more than one codon [2l)[l27j. 

From 64 possible codons one can extract 16 families each defined by first two 
nucleotides. Let we denote a triplet (5'-l-2-3-3') by XYZ. Then the codon sense 
can be fully determined by first two nucleotides X and Y independently of third 
Z. There are 8 unmixed families (all 4 codons encode the same amino acid), and 8 
mixed families for which several patterns of assignment exist, in 6 of the latter the 
pyrimidine codons (Z = C,U) determine one amino acid, and the purine codons 
(Z = A, G) determine other ones or termination signals (in one family). It was 
found that two third part of all DNA bases are identical for all organisms for the 
sake of first two nucleotides in a triplet, and variability of DNA composition is 
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given by the third base |28|l26j. 

All 16 doublets XY can be presented as the canonical matrix [2H] 
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called the "rhombic code" [SHIE]. They are grouped together in 2 octets distin- 
guished by ability of amino acid determination: 8 doublets CC, AC, GC, CU, 
GU, UC, CG, GG determine amino acid independently of third base (upper 
part in (fl|)). and so they can be called "strong", and other 8 doublets AA,AU, 
UU, CA, GA, UG, AG, UA (lower part in (fl|)) for which third base determines 
content of codons can be called "weak" ones |29|l32] . The "strong" set of doublets 
has the following relative content C:G:U:A = 7:5:3:1, while the "weak" 
set has the reverse content C:G:U:A=1:3:5:7 33!. Note that there is 
only one A in the "strong" octet, and one C in "weak" octet, and all 4 doublets 
with Y = C completely determine amino acid, but only 2 doublets with Y = G 
and Y = U completely determine it, while doublets with Y = A never determine 
amino acid. Thus, 4 nucleotides can be arranged in descending order C, G, U, 
A by their determinative ability ("strength") [SHIM]. 

We introduce a numerical characteristics of the empirical "strength" — deter- 
minative degree of nucleotide dx in the following way 
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d G = 3 
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in 2 cases 
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d A = l 
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(2) 



Pyrimidine 
C 

d c = 4 

very "strong" 
completely 

which allows us to make transition from qualitative to quantitative description of 
genetic code structure j^HHHS]- 

We use the notation T/U, because genetic code is read from mRNA, and so 
we will not differentiate their determinative ability ( "strength" ) in what follows. 

Let us present four bases (J2J) as the vector-column 



V 



V 2 

v 3 
V v 4 y 



/ c< 4 > \ 

GO) 

T (2) 

V A« ) 



(3) 



3 



and the corresponding the vector-row 



V 



T 



( C (4) G (3) T( 2 ) A^ ) . 



(4) 



where the upper index for nucleotide denotes determinative degree. We make the 
exterior product of vector-column © and vector-row as follows 



V x Y 1 



I c(4) C (4) c (4) G (3) C (4) T (2) C (4) A (1) \ 

G (3) C (4) G (3) G (3) G (3) T( 2 ) G (3) AW 

t( 2 )c (4) t( 2 )g (3) t( 2 )t( 2 ) t^aw 

^ awc (4) awg (3) a(^t( 2 ) A^A^ J 



(5) 



It is remarkable that the matrix M © fully coincides with the canonical matrix 
of doublets if and only if the vector V has the determinative degree order 
C, G, U, A ((21). Although there are 4!=24 possibilities to place 4 bases in 
row, but all others except one presented in (j2J) do not reflect phenomenological 
properties of genetic code. It follows that the intuitive "rhombic code" and genetic 
vocabulary j2nHS0llSI| have their own inner abstract structure uniquely defined 
by exterior product of special vectors Q. This ordering is also adjusted to the 
schemes j^HlEZI, also (partially) with half time of nucleotide substitution under 
mutational pressure [3H] and the nucleotides information weights Indeed 
these facts allows us to introduce the determinative degree, as an abstract variable 
being a numerical measure of nucleotide difference in ability to determine sense of 
codon 1231121!. 

Analogous model for the triplet genetic code can be constructed using triple 
exterior product in the same way [23] • We dispose the doublet matrix M on the 
XY plane and multiply it on the vector-column V (j3J) disposed along Z axis, i.e. 
we construct the triple exterior product 



K = Vx 



(6) 



Thus we obtain three-dimensional matrix over set of all triplets, and, since 
each codon (except three terminal ones) corresponds to an amino acid, that can 
be treated as a cubic matrix model of the genetic code [23J. 



3 Determinative degree and nucleotide proper- 
ties 

The connection bulk DNA structure and various properties of nucleotides was 
studies in jlUJHT]. It is well-known that by chemical structure the 4 nitrous bases 
can be divided into: 
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1) purine (A,G) and pyrimidine (C,T); 

2) having amino (A,C) group and (G,T) keto group; 

3) making 3 (strong) hydrogen bonds (C,G) and 2 (weak) hydrogen bonds 



( A,T) 



They give rise to 3 symmetry transformations: 
1) Purine-pyrimidine symmetry 
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2) Amino-keto symmetry 
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3) Complementary symmetry (leaving the double helix invariant) 
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where the even (because determinant is +1) permutation matrices lZ pur , TZamino, Ttcompi 
satisfy 

TZpurT^'aminoT^'Cornpl 1 1 

and two of them, e.g. lZ pur , lZ comp i together with the identity matrix X form the 
dihedral group D 2 which is the symmetry group of the dihedron, or regular double- 
pyramid, with vertices on the unit-sphere (see e.g. |12])- Another representation 
of this group by 3 x 3 rotational matrices is called a DNA group [13]. 

The difference in the number of hydrogen bonds causes the different interac- 
tion with its complementary nucleotide: each "strong" nucleotide (C and G) has 
3 bonds and the energy of C-G interaction is -2.4 kkal/mol, and each "weak" 
nucleotide (T and A) has only 2 bonds and the energy of A-T interaction is -1.2 
kkal/mol [213 • Therefore each base has its own properties and so dividing them 
into only 2 groups is not sufficient. 

We then can search whether the ordering (J2J) is adjusted to some physical 
properties of nucleotides. 
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Figure 1: Dipole moment of DNA bases calculated by methods AMI [IT] (triangles) 
and two modifications of MP2 j44| (squares and circles). The corresponding linear 
fits are: D AM1 = 1.21 + 1.37^ (R = 0.96); £> M P2(i) = 1-45 + 1.36^ (R = 0.93); 
£>MP2( 2 ) = 1.5 + 1.41^ (# = 0.93). 

First we observe that the dipole moment of bases is proportional to the deter- 
minative degree as it is shown on Fig. ^ 

Then we see that the weight of hydration sites for bases is also proportional to 
the determinative degree Fig. El 

We can conclude that the determinative degree reflects not only redundancy of 
genetic code in the third position, but also connected with some energetic proper- 
ties of bases themselves. 

4 Trianders and their characteristics 

It can be assumed that the phenomenological properties of genetic code and in- 
equality of bases (reflected in will become apparent in real nucleotide se- 
quences. Here we use the introduced determinative degree to build a new kind 
of sequence analysis based on some special modification of DNA walks method 

[iHiiiiiinEZj. 
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Figure 2: Weights of hydration site ITol. 
4.1 Triander construction 

We embed a nucleotide sequence into the two-dimensional determinative degree 
space (DD plane) in the following way. The axis assignment corresponds to the 
value of nucleotide determinative degree as 

Axis x: {A} = (-1, 0) ; {T} = (+2, 0) , 
Axis?/: {G} = (0,-3);{C} = (0,+4). 

Moving along a sequence produces a walk in the determinative degree space 
which we call a determinative degree walk. In general, a current point on DD plane 
after i steps is determined by the coordinates 

x f D = d T n T (t) - d A n A (i) , (10) 
y? D = d c n c (?) - d G n G (i) , (11) 

where nx (i) is cumulative quantity of nucleotide X after i steps and dx is the 
determinative degree of nucleotide X. The standard DNA walks ^Zj (genome 
landscapes [T§] ) have all d x = 1 in (JTUJ) (JTTJ) , i.e. 

x standard = ^ _ ^ ( •) ^ ^ 
y standard = ^ ^ _ ^ _ ^3) 
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The one- dimensional (purine/pyrimidine) DNA walks are defined by only one co- 
ordinate, while x is chosen as position, i.e. 



x: 



(14) 



yf P = n c (i) + n T (i) - n A (i) - n G (i) . 



(15) 



Therefore, while "purine/pyrimidine" DNA walks manifestly show the purine/pyrimidine 
imbalance, the standard DNA walks (|12 |) -(|13 j) applied for one strand show DNA 
asymmetry [101112] (violation of the Parity Rule 2 |48j). the determinative degree 
walk (fTTHl - (fTT| visually shows "strength" imbalance in one strand. 

Then we build 3 independent determinative degree walks beginning from 1st 
nucleotide with step 3 (due to the triplet structure of genetic code). In this way 
we obtain 3 broken lines (branches) starting from the point of origin, and each 
of them presents the determinative degree walk through the following nucleotide 
numbers: 

1st branch goes through 1,4,7,10,13... positions; 

2nd branch goes through 2,5,8,11,14... positions; 

3rd branch goes through 3,6,9,12,15... positions. 

These 3 branches on the determinative degree plane are called triander. 

If 1st letter corresponds to the first start codon nucleotide, then the triander 
branches represent nucleotide sets in three codon positions independently. 

As distinct from previous versions of DNA walks in which all 4 nucleotides are 
regarded equivalent in the sense they give equal by module shifts, in our approach 
each nucleotide gives contribution different by module (which is taken equal to 
its determinative degree). So, despite we obtain at first sight isomorphic to [21] 
plot, trianders show not only quantitative composition and pure statistical laws 
of symbol strings, but also reflect connection between nucleotide sequences and 
inner phenomenological properties of genetic code and physico chemical properties 
of bases. 

As an example of triander we will take the dystrophin gene which is the largest 
gene found in nature, measuring 2.4 Mb, and is responsible for Duchenne (DMD) 
and Becker (BMD) muscular dystrophies |49j . The dystrophin RNA is differen- 
tially spliced, producing a range of different transcripts, encoding a large set of 
protein isoforms. Dystrophin is a large, rod-like cytoskeletal protein which is found 
at the inner surface of muscle fibers. The triander for the dystrophin gene is pre- 
sented on Fig. El For comparison we also show the triander for a shuffled sequence 
of the same nucleotide composition. Obviously the ideal triander for uniformly 
random sequence consists of 3 flowing together lines from the origin having 45 
degrees slope. This line also corresponds to the symmetric sequence satisfying the 
Parity Rule 2 |lHj: iVc = Nc, Nt = Na- Such lines are presented on all triander 
plots below for normalization. 
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4.2 Determinative degree angle 

An important visual characteristic of a triander is the slop of its branches, we call 
it determinative degree (DD) angle, which for a current point can be calculated 
by 

4n c (i) - 3n G (i) 

tan am = ^ -— . 

V ; 2n T (i) - n A (i) 

Here and below nx denotes cumulative quantity of nucleotide X for a given 
branch. Evidently, for uniformly random sequence or a symmetric sequence sat- 
isfying the Parity Rule 2 [IE] the angle will be 45 degree (horizontal dashed line 
of the below plots), and so the difference from this value will say about nontrivial 
ordering. The plots of current values of a for the dystrophin gene and for a shuffled 
sequence of the same nucleotide composition are presented on Fig. 0] 

We stress that trianders show not only quantitative composition, but allow us 
to find local motives in a more clear way, because different modules for nucleotides 
lead to less number of superposition and selfintersections. Also trianders more 
accurately reflect the tendency of the sequence as a whole similarly to DNA walks. 
Thus triander can be treated as a "picture" , "genome passport" or "genogramma" 
of a given sequence. 

If we remember that third base in codon has maximal redundancy, then 3rd 
branch of a triander gets a definite "physical sense" . Let us assume that the deter- 
minative degree is an additive variable (which can be made in first approximation 
at least [23J), then 3rd branch can show current "strength" of sequence, that is 
the "bulk" ability to determine sense of codon. In this scheme other two branches 
can be treated as 3rd branch with shifted ORF (Open Reading Frame). 
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Figure 4: Current DD angle for triander of the Homo Sapiens dystrophin gene 
(left) and a shuffled sequence of the same nucleotide composition (right). 



4.3 Euclidean and Manhattan distances 

As the measure of the "sequence strength" we can choose length of the radius- 
vector from the origin to the current point of triander, i.e. the Euclidean distance 



(An c (i) - 3n G (i)) 2 + (2n T (i) - n A (i)) 2 . 



We can also use the Manhattan distance 3 

D M (i) = |4n c (i) - 3n G (i)| + \2n T (i 



n A m 



which is the distance between two points measured along axes at right angles (see 
e.g. [S3). 

In case of symmetric sequence (equal number of all nucleotides) at the step 
% the Euclidean and Manhattan distances are D E (i) = i/y/2 and D M (z) = i/2 
(which is shown by dashed lines on Figs. I5I6J) . 



4.4 Visualization of the genetic code triplet nature 

Now we make sure that triplet character of the genetic code can be seen directly 
from sequences representation by trianders. As an example we take gene of Homo 
sapiens Che-1 mRNA. We consider additionally analogs of trianders with different 
phases = 4,5,7. The result is presented on Fig. from which it is seen that only 
the case phase = 3 provides nontrivial ordering leading to definite branches, that 
is we have clear visual presentation of the strong triplet signal. 

3 Also known as rectilinear distance, and it can be treated as the distance that would be 
traveled to get from one data point to the other if a grid-like path is followed (a car driving in a 
city laid out in square blocks, like Manhattan). 
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Figure 5: Current DD Euclidean distance for triander of the Homo Sapiens dys- 
trophin gene (left) and a shuffled sequence of the same nucleotide composition 
(right). 

In such a way one could search for higher phase statistical correlations and 
possible structures, if any, in nucleotide sequences. 

4.5 Transformations of trianders 

Here we illustrate how the symmetry transformations influence on the triander. 
As an example we take the Homo sapiens dystrophin from Fig. El and result of 
various symmetry transformations (|7|l -([5|l and reversing the sequence is shown on 
Fig. El 

We observe that the reverse triander is very similar to the original one on Fig. 

El 



4.6 Three-dimensional trianders 

The previously constructed two-dimensional trianders have the disadvantage form, 
because it is not clear, where in the sequence a given point is. To improve this we 
introduce three-dimensional trianders which are defined by the formula 



%i = d T n T (z) - d A n A (z) , (16) 
VT = d c n c (z) - d G n G (z) , (17) 

(18) 

which can be treated as mixing of one-dimensional and two-dimensional cases with 
taking into account the determinative degree. Then, any on the DD space structure 
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Figure 6: Current DD Manhattan distance for triander of the Homo Sapiens dys- 
trophin gene (left) and a shuffled sequence of the same nucleotide composition 
(right). 



can be definitely visually localized using vertical axis. On the Fig. El we show the 
three-dimensional triander of the Homo sapiens zinc finger and its shuffled version. 
All the graphs start from one point, the origin, and have different length (which 
can be simply calculated from (jlfij) — IjlKjl ). characterizing them as a whole. 



5 Topological classification of trianders 

Here we propose the topological classification of trianders by their branches be- 
longing to different quadrants on the DD plane and to number of intersections 
and return point. This makes possible studying the fine structure of any length 
sequences with exactly established functions (genes, intergenic space, repeat re- 
gions etc..) and comparing various locuses, as well as searching for homo logical 
regions, which can allow us to work out mathematically strong genomic signature 
formalism [5T|I52|. 

We note that there exist many types of trianders. A triander corresponding to a 
gene we call a genogram, and a triander corresponding to intergenic space we call a 
gapgram. If some branches intersect each other we say about intersecting triander, 
if a branch intersects itself producing knots, we say about knot triander. Branches 
can also have (multiple) return points, and then we say about returned triander. 
Thus, the determinative degree walk "topologizing" in our sense means that we 
identify trianders having definite structure topological features (knot, intersection, 
return point) and place them into a special class. 

So we may hope that such topological classification of trianders can actually 
help in solving by visual way the inverse problem: for a given sequence to predict 
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its possible function. 

Let nx (z) be cumulative quantity of nucleotide X after % steps, then the DD 
plane quadrants are defined by (|TU ]) -([TT |) . and therefore 

I: 2n T (i) - n A (i) > 0; 4n c (i) - 3n G (i) > 0; 

II: 2n T (i) - n A (i) < 0; 4n c (i) - 3n G (i) > 0; 

III: 2n T (i) - n A (i) < 0; 4n c (i) - 3n G (i) < 0; 

IV: 2n T (i) - n A (i) < 0; 4n c (i) - 3n G (i) < 0. 

After examination of around 2000 eukaryotic and prokaryotic sequences we 
found all trianders can be distinguished into several types. The first type is a 
chaotic triander which has no definite branch structure, other ones can be called 
ordered trianders. To work out the general classification of ordered trianders and 
description of branches we introduce the notion: 

Type A-B-C^ (E), (19) 

where A is quadrant where 1st branch lies, B is quadrant where 2nd branch lies, 
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Figure 8: The trianders for the transformed sequences of Homo sapiens dystrophin 
(DMD), transcript variant D140ab, mRNA in case of the amino-keto symme- 
try, purine-pyrimidine symmetry, complementarity symmetry and reverse sequence 
reading. The original nontransformed triander is shown on Fig. |3J 



C is quadrant where 3rd branch lies; E is characteristics of triander as a whole; 
indices F and (x, y) describe properties of corresponding separate branches (also 
for A and B) which will be explained below. 

In general there are 4 3 = 64 possible ordered triander types classified by quad- 
rants only. We will identify trianders which differ by permutation, because it 
corresponds to ORF shift, thus decreasing to 24 types. Nevertheless, observation 
showed that there exist only 7 triander types: I-I-I, I-I-II, I-I-III, I-I-IV, I-II- 
III, I-II-IV, I-III-IV. For example, the Type I-I-II includes the Types I-II-I and 
Types II-I-I, if we shift ORF to 1 and 2, but on figures we show exact triander 
names (fTT)|) . 

If e.g. a branch crosses from I quadrant to II quadrant, we denote that by 
fraction I/II. For instance, the triander of Homo Sapiens dystrophin gene Fig. |3J 
is of Type II-I-I/IV. 
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Figure 9: Three-dimensional trianders of the Homo sapiens zinc finger protein 265 
(ZNF265), mRNA (left) and its shuffled version (right). 

The additional qualitative features of triander as a whole observed from se- 
quence examination are 

E=sharp, flat, parallel. 
For branch properties we have 

x,y denotes axis to which a branch is parallel, 
F=blury, loop, smooth, oscillative (horizontal, vertical). 

Separately we can describe "interaction" of branches as: 

1) Single intersection of A and B is denoted by sign A^B, which gives inter- 
secting triander; 

2) Multiple intersection A and B is called braiding and denoted A B, which 
gives braiding triander. See Fig. El 

We have thoroughly analyzed 150 sequences different by function and evolution 
level, and for each sequence there were also constructed 100 shuffled sequences 
having the same nucleotide composition, but not coinciding with the examined 
one. They are presented in the Tables 11121 

For every class we show a typical triander of Fig. ^2 where the following real 
sequences are presented: 

a) Chaotic triander. Dengue virus type 1 strain FGA/NA did, intergenic 
space: AF226686. 
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Table 1: Examined sequences 



Type 


sequence description 


Number 




Arabidopsis thaliana clone 7867 mRNA, 






C.elegans essential lethal-805, myotactin complex 


2 




C.elegans heterochronic gene LIN-42 


1 




C.elegans ribosomal protein L32 


1 


1,1,1 


Dros melanogaster chromosome 3R 


1 




HS apoptosis-associated tyrosine kinase 


2 




HS coagulation factor VIII 


1 




HS COL4A6 gene for a6(IV) collagen 


2 




HS cytochrome P450, family 1-8, subfamily A,B,C,F,J 


15 




HS DNA cross-link repair 1A,1C (PS02 homolog, S. cerevisiae) 


2 




HS dystrophin (DMD) gene, exons,transcript variants 


4 




HS exportin, tRNA 


1 




HS FGG gene for fibrinogen, 


2 




HS fibrinogen alpha chain gene, complete mRNA 


4 




HS glutathione synthetase (GSS), mRJNA 


2 




HS H19 gene, complete sequence, mRNA 


2 




HS myeloid ecotropic viral integration site mRNA 


1 




HS MESTIT1 antisense RNA,intergenic space 


1 




HS mRNA 10q24.3-qter 


1 




Hs mucosal vascular addressin cell adhesion molecule 


1 




HS phenylalanine hydroxylase (PAH) 


3 




Hs phytanoyl-CoA hydroxylase interacting protein 


1 




HS retinal dystrophin (DMD) gene Exon 30 


1 




Hs solute carrier family 22 (organic anion transporter), 


1 




HS suppressor of cytokine signaling, intergenic space 


1 




HS syntrophin, alpha 1 


3 




HS vitelliform macular dystrophy 


2 




HS, chorionic somatotropin hormone 


1 




HS, genomic DNA chroml,intregenic space 


1 




HS, genomic DNA P450 intergenic space 


1 




Human CYP2D7BP pseudogene for cytochrome P450 2D6 


1 




Human mRNA encoding placental lactogen hormone 


1 




Human nested gene protein gene 


1 




Mus musculus insulin-like growth factor 


2 




Mus musculus interleukin 11 receptor 


i 
i 




Mus musculus like-glycosyltransferase mRNA 


1 




Mus musculus similar to mitochondrial ribosomal protein S36 


2 




Rat gene for alpha-fibrinogen 


1 




Rattus norvegicus cytochrome ^450 IIA3 mRNA, 3' end 


2 




Similar to FBJ murine osteosarcoma viral oncogene 


1 




Takifugu rubripes DMD gene 


1 



Table 2: Examined sequences 



Type 


sequence description 


AT "U 

IN umber 




Dengue virus type 1 strain FGA/NA, complete genome. 


1 




HS ATP-binding cassette, sub-family C (CFTR/MRP) ,mRNA 


3 




HS aldehyde dehydrogenase 1 family 


5 




Human fibrinogen beta-chain mRNA, partial cds 41bl, short asym 


2 




Homo sapiens fibrinogen-like 1 (FGL1), transcript variant 


1 




C.elegans immunoglobulin 


1 




Hs spondyloepiphyseal dysplasia late RNA 


1 




C.elegans nematode cuticle collagen 


1 




C.elegans cuticle collagen family member (28.9 kD) 


1 


I,I,IV 


Chaetosphaeridium globosum chloroplast, complete genomes 


1 




C. elegans Collagen with Endostatin domain CLE-1 


1 




Hs aldehyde dehydrogenase 1 family, member A2 (ALDH1A2) 


1 




Hs chromosome 20 open reading frame 1 (C20orfl), mRNA 


1 




HS ATP-binding cassette, sub-family C (CFTR/MRP)mRNA 


2 




Human cytochrome P450 (CYP2A13) gene, complete cds. 


1 




Hs cytochrome P450 2S1 (CYP2S1) mRNA, complete cds. 


1 




Hs P450 (cytochrome) oxidoreductase (POR), 


1 




Hs cytochrome P450, family 2,39,51,20 


4 




Hs chromosome 1 MRG1 intergenic space 


1 




Hs cytochrome P450 intergenic space 


1 


I,II,IV 


HS dystrophin (DMD) D140ab, variants, mRNA. 


5 




H. sapiens mRNA tor nbosomal protein L30 


1 




TT HAT A r *1 1 J * T * "» ,~\ 

Human mRJNA tor nbosomal protein L32 


1 


I,I,III 


Hs collagen, type IX, alpha 2 (COL9A2), mRNA 


1 




Mus musculus ribosomal protein L32 


1 




Mus musculus, ribosomal protein L30 BC002060 


1 




TT * i* i ** i ■ l ' c i /AA mi — 1\ 

Homo sapiens apoptosis antagonizing transcription factor (AATF) 


5 


I,II,III 


Homo sapiens H1F5 histone family 


1 




Hs cytochrome P450, family 17, subfamily A, polypeptide 1 


1 




HS genomic cluster, HI histone family, member 5 


1 




HS survival of motor neuron 1, telomeric (SMINIlj 


2 




HS Che-1 mRNA, complete cds.l23mRNA 


1 




Hs Zlblprotem zoo (/INrzooJ mKlNA, 


1 


I,III,IV 


Hs utrophin (homologous to dystrophin) 3bls 11 


1 




Macaca fascicularis RPL30 mRNA, family 


4 




HS bestrophin (VMD2) mRNA, alternatively spliced product 


1 


I,I,II 


HS BTG family, member 2 (BTG2), mRNA 


1 




HS Cbp/p300-interacting transactivator,with Glu/Asp-rich 


1 




Human msgl-related gene 1 Ijrrirgl) mRNA 


1 




b) Type I-I-F. Homo sapiens cytochrome P450, family 2, subfamily F, polypep- 
tide 1 (CYP2F1), mRNA: NM_000774. 

c) Type IIb; M . ry -F-F(flat). Homo sapiens Cbp/p300-interacting transactivator, 
mRNA: NMJD06079. 

d) Type lllosdii-hiury-^-osciii- Homo sapiens collagen, type IX, alpha 2 (COL9A2), 
mRNA: NM_001852. 

e) Type IV-I#I. Caenorhabditis elegans immunoglobulin domain- containing 
protein family member (106.4 kD), mRNA: NM_171617. 

f) Type III-II-F (sharp). Homo sapiens HI histone family, member 5 (H1F5), 
mRNA: NM_005322. 

g) Type II#I-IV. Homo sapiens dystrophin (muscular dystrophy, Duchenne 
and Becker types) (DMD), transcript variant D140ab, mRNA: NM_004022. 

h) Type Ij oop -I-IV. Homo sapiens utrophin (homologous to dystrophin) (UTRN), 
mRNA: NM_007124. 

Further more careful topological classification and analysis of two- and three- 
dimensional trianders can be made using some of the topological curves methods 
jSSHSHESl or the knot theory (see e.g. H23E2)- 



6 Conclusions 

We can conclude that the introduced determinative degree DNA walk method 
confirms the "mosaic" stricture of genome, shows parts with different nucleotide 
content and "strength" , and so allows us to find the "fine structure" of nucleotide 
sequences. 
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We propose a general method of identification of DNA sequence "by triander" , 
which can be treated as a unique "genogram" , "gene passport" , etc. The two- and 
three-dimensional trianders are introduced and their features are studied. 

The difference of the nucleotide sequences fine structure in genes and the in- 
tergenic space is shown. Also there is a clear triplet signal in coding locuses which 
is absent in the intergenic space and is independent from the sequence length, 
but depends from composition only. All plots are compared with corresponding 
shuffled sequences of the same nucleotide composition, which allows us to extract 
real ordering effect from composition influence. 

We have constructed the classification of trianders, on its basis a detail working 
out signatures of functionally different genomic regions can be made. 
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